Multi-objective infinite-horizon discounted Markov decision processes
نویسندگان
چکیده
منابع مشابه
On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes
We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملAverage Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes
We consider a nonhomogeneous stochastic infinite horizon optimization problem whose objective is to minimize the overall average cost per-period of an infinite sequence of actions (average optimality). Optimal solutions to such problems will in general be non-stationary. Moreover, a solution which initially makes poor decisions, and then selects wisely thereafter, can be average optimal. Howeve...
متن کاملInformation Relaxation Bounds for Infinite Horizon Markov Decision Processes
We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs a...
متن کاملMulti-objective discounted dynamic programming The Neighbour Search approach to construct Pareto sets of multi-objective Markov Decision Processes
The Neighbour Search (NS) algorithm, is an iterative method for constructing Pareto sets of multi-dimensional polytopes. A NS iteration consists in two steps: Edges Exploration and Neighbour Detection. Edges Exploration takes a Pareto vertex and determines all Pareto edges connecting such a Pareto vertex to its neighbours. Each neighbour is again a Pareto vertex that is obtained by Neighbour De...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Mathematical Analysis and Applications
سال: 1982
ISSN: 0022-247X
DOI: 10.1016/0022-247x(82)90122-6